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(54) TlUe: METHODS TO REDUCE VARIANCE IN TREATMENT STUDIES USING GENOTYPING 
(57) Abstract 

Hie present invention provides methods, computer programs and computerized systems 
useful for evaluating the efficacy of various types of treatment procedures (e.g.. clinical 
trials) as a function of the genotype of a subject. By matching treatment and control 
groups genetically, die methods and systems of the invention reduce the total variance of 
die study, thereby allowing trials examining die efficacy or effect of treatment procedures to 
be conducted widi fewer subjects, wltii increased confidence values, and/or widi increased 
precision or discriminatory power. <3ertidn mediods of the invention involve selecting treated 
and control subpopulations of subjects from treated and control populations for simUarity in 
polymorphic profile, wherein die treated and control populations have been treated widi a 
treatment and control procedure, respectively. A determination is dien made whedier diere 
is a statistically significant difference in a test parameter between die treated and control 
subpopulations as an assessment of die test procedure. 
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METHODS TO REDUCE VARIANCE IN IBEATMENT STUDIES 

USING GENOTYPING 

This application claims the benefit of U.S. Provisional Application No. 
60/110.668. fdedDecen^er 2.1998. which is incorporatedby reference in its entir^t^ 

all purposes. 

The present invention resides in the fields of medicine, genetics and 

statistics. 

p ^ r-yr^ pOTIND TWP mVRNTIQN 

The conduct and design of studies for investigating treatment efficacy such 
asclinical trials aims to eliminatethebias that can arise fiom'-randon^^^ 
betheygeneticorenviionmental.asweUasbiasintroducedbythei«vestiga^ 
otherwise. One approach for redudng bias is to randomize individuals to dther treatment 
contit>lgroupswiththeviewthatiftheindividualsinthetwogmupsareumelated 
genetically and Uveindependent of one another.thenbotii genetic and cnviromnental 

infiuencesontiietrialwillbebalancedintiietwoarmsofti.estudy. Animmediate 
consequenceofrandomizationintiusway.however.isti.attheVarianceofthebiologicd 

conditionmeasured is greater tium if each case ismatched for genetic andknown 
environmental influences. 

Onemethodfordeterminmggeneticvariability isby assessmentof 
polymorphic profile. Polymorphismsrefer to ti^ecoexistence of multiple fotm^ 
sequence inapopulation. Several different types of polymorphisms have been reported. A 
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restriction fragment length polymorphism (RFLP), for example, means a variation in DNA 
sequence that alters the length of a reshiction fragment (see, e.g.. Botstem et al.. Am. J. 
Hum. Genet. 32:314-331 (1980)). Short tandem repeats (STRs), as the name implies, are 
short tandem repeats that consist of tandem di-, tri- and tetra-nucleotide repeat motifs. Such 
5 polymorphisms are also sometimes referred to as variable number tandem repeat (VNTR) 
polymorphisms (see, e.g., U.S. Patent No. 5,075,217; Armour et al., PEBS Lett 307:1 13- 
115 (1992); and Horn et aL, WO 91/14003). 

By far the most conunon form of polymorphisms are those involving single 
nucleotide variations between individuals of the same species; such polymorphisms are 
10 called single nucleotide polymorphisms, or simply SNPs. Some SNPs tiiat occur in protein 
coding regions give rise to the expression of variant or defective proteins, and thus are 
potentially flie cause of a gaietic disease. Even SNPs that occur in non-coding regions can 
nonetheless result in defective protdn expression (e.g., by causing defective slicing). 
Other SNPs have no phaio^ic effects. 

15 

.qTTMMARY OF THE INVENTION 

Cotain methods of the invention are designed to provide an assessment of 
the efficacy of a treatment procedure. In general, such methods involve selecting treated 
and control subpopulations from treated and control populations of subjects, wham the 

20 treated population has been treated with a tireatment procedure and the contirol population 
has been treated with a conh-ol procedure. The subjects in both the treated and control 
populations have been characterized for polymorphic profile and are selected because 
they have similar polymorphic profiles. A determination is tiien made whether there is a 
statistically significant difference in a test parameter Iwtween the treated and control 

25 subpopulations. In general, such a statistically significant difforence mdicates that there is 
a correlation between the type of treatment and one or more polymorphic forms within the 
polymorphic profile for which the treated and control subpopulations were selected. 

In some instances, especially when a significant difference is not found, 
the selecting and determining steps are repeated one or more times. In such additional 
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differs ftomthcpolymoiptepromcsclecu ' f„^.t 

^....acon^o.^up. F r--'^^^^ ^^^^ 
polymo:pHcfonn.b«tm<.«.yp.«lly 

„^.o.OOpo>vn.on.«cfonnsor..«^I.™»^^^ ^^^^ 

Tl. Somcof.h«cm«h»d.ini«yfa"lvettcaBng.««e<i 

p^u,«ion»4.oo«rolpopula<.o»» " ^^a^„„^.offt.dn.g<.r 

^,p,c«.d»r.(.g,««ingv„«.aP>-^»-^* 

j-fliw treatment schedule), respectively, asuu^ f 

rrrp::o:::«rr..^^^^^^^^ 

efficacy. . „««„nterized methods. For example, 

a conu.1 p— . « to.ga.« ^ ^ f„ ^ 

30 the determining Step IS then displays 
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display). 

In another aspect, the invention provides various computer systems and 
programs. For instance, certain computer products for assessing a treatment procedure 
are provided. Some systems include program products that generally include code for 
5 providing or receiving data, wherein the data includes: (1) designations for each member 
of a treated population treated according to a treatment procedure and for each member of 
a control population treated according to a control procedure, (2) designations for a 
polymorphic profile for each member of the treated and control populations, and (3) 
designations for a test parameter for each member of the treated and control populations. 
ID The program also includes code for selecting a subpopulation from each of the treatment 
and control populatidns that have a similar polymorphic profile, code for detomiinmg 
whether there is a statistically significant difference in the test parameter between the 
subpopulations and code for displaying an output that indicates whether a statistically 
significant difference was found between the subpopulations. The code is typically stored 
15 on a computer readable storage medium. 

The invention further provides a computerized system for assessing 
treatment procedures. Some systems generally include a memory, a system bus and a 
processor. The processor is operatively disposed to provide or recesive data, wherein the 
data includes: (1) designations for each member of a treated population having been 
20 treated according to a treatment procedure and for each member of a control population 
treated according to a control procedure, (2) designations for a polymorphic profile for 
each member of the treated and control populations, and (3) designations for a test 
parameter for each member of the treated and control populations. The processor is 
fiirther disposed to select a subpopulation from each of the treatment and control 
25 . populations that have a similar polymorphic profile and determine whether there is a 

statistically significant differrace in the test parameter between the sub The . 

microprocessor is also capable of displaying an output indicating wlhether a statistically 
significant difference was found between the subpopulations. 
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^iiiiM»E-rTTTrnn>T np awings 

FIGS;i and2depictc«mputersystemsforimpleme.tingihemeft^^^ 

the invention. 

m3»aBo»chffltte.m«hodof«s«»ing.lie.m«ntpro«<i»re 
5 accor<imgtothepresentii.veiitioii. 

I. DsfinifiifflS 
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the form of a case-control study (for a discrete random variable, the groups bemg affected 
and unaffected individuals) or a single population study where the cause of the degree or 
severity of the phenotype is being investigated (for example, a quantitative study can 
examine blood pressure, blood glucose, etc.). 

5 A "treatment study" is an inquiry into the effect or influence a particular 

treatment procedure has on a biological condition, biological susceptibility or biological 
resistance of a subject. The study can be quite structured, formal and extensive in scope, 
or can be relatively unstructured and of limited scope. For example, a treatment study can 
be a formal clinical trial or study performed on a relatively large group of subjects 

10 wherein the study is performed according to set guidelines (eg., governmental 

regulations). However, the treatment study can also be a preclinical study, a field trial of 
a plant population or even an informal study by a scientist, veterinarian or a physician of 
the effects of a treatment on relatively few subjects. In a treatment study, the subjects are 
divided into several (though often just two) groups. These may represent different doses 

15 ranges or simply the treated and the untreated subjects. In the study, the random variable 
is measured after treatment. It may also be measured before treatment if it is a change in 
the variable over time that is being investigated (e.g., bone mineral density or blood 
pressure). It is preferable that subjects are not undergomg any other treatments for their 
pathological condition. However, if such a constraint is unreasonable, the study should be 

20 designed so that subjects in both treated and untreated groups are undergoing the same 
alternative treatmmt. The subjects of the treatment study can be conducted with any type 
of organism, including, for example, animals (including humans), plants, bacteria and 
viruses. 

A "biological condition" refers to the condition, susceptibility or resistance 
25 of the organism upon which the study seeks to determine whether the treatment procedure 
has an effect. Typically, the biological condition is a physical or physiological condition 
of the organism; For example, in some instances the biological condition is a 
pathological condition (i.e., a physiological state that normally does not exist, such as a 
disease for example). Pathological conditions typically studied with the methods of the 
30 invention are those with a minimal environmental variance {e.g., high cholesterol levels in 
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arteriosclerosis, the tesi parameter can be serum cholesterol concentration. 

The tarn "variance" refers to variation, scatter, spread or dispersion about 
the arithmetic mean. Schematically, the variance is the mean value of the squared 
deviations (Armitage, P., STATisncALMEraoDS IN Medical Research. Bladcwell 
Scientific. Oxford. United Kingdom (1971)). A large variance indicates large deviations 
from the arithmetic mean. For example, if cholesterol level is the test parameter being 
measured, a mean cholesterol level is determined. The variance represents the average 
squared deviation of all cholesterol levels relative to the mean. Other statistical measures 
of spread or dispersion about a mean can also be used. Typically, tihe distribution of the 
testparametertakestheshapeofabell-shapedoranormal(Gauss£an)curve. Pictorially. " 

the invention decreases the yariance and thus narrows the bell-shape of the normal curve 
or, described mathematically, the distribution becomes leptokurtic. 

Typically, the variance is due to dissimilar effects on the subjects that 
influence the biological condition being analyzed by statistical methods, e.g.. genetic, 
enviromnental and measurement variables. For example, in most treatment studies. 

because the same methods are used to measure the biological condition among the entire 
population being studied, the variance is due to genetic differences between individual 
subjectsandtheenviromnentinwhichthesubjectslive. Examples of envirxmmental 
influences include diet, sleep patterns, geographical location and culture. 

A "polymorphism" refers to the occurrence of two or more genetically 
determmed alternative sequences or alleles in a population generally said to be occurring 
at a frequency of greater than 0.1 %. A polymorphic marker or site is the locus at which 
genetic divergence occurs. Preferred maikers have at least two aUeles. each occurring at 
frequency ofgreater than 1% ma selected population. A polymorphic locus can be as 
small as one basepair. Such a locus is referred to as a single nucleotide polymorphism or 

simply SNP. 

Polymorphic markers include restriction fragment length polymorphisms, 
variable number of tandem repeats (VNTR's), hypervariable regions, minisatellites, 
dinucleotide repeats, trinucleotide repeats, tetranucleotide repeats, simple sequence 
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repeats, and insertion elements such as Alu. The first identified alldic form is aibitrarily 
designated as the reference fotm and other allelic forms are designated as alternative or 
variant alleles. The allelic form occurring most frequently in a selected population is 
sometimes referred to as the wildtype form or allele and the other fonns referred to as 
5 mutant forms or alleles. Diploid organisms can be homozygous or heterozygous for 
allelic forms. A diallelic polymorphism has two forms. A triallelic polymorphism has 
three forms, 

A "single nucleotide polymorphism*' occurs at a polymorphic site that is 
occupied by a single nucleotide, which is the site of variation between allelic sequences. 
10 The site is usually preceded by and followed by highly conserved sequences of the allele 
{e,g,, sequences that vary in less than 1/100 or 1/1000 members of the populations). 

A single nucleotide polymorphism (SNP) usually arises due to substitution 
of one nucleotide for another at the polymorphic site. A transition is the replacement of 
one purine by another purine or one pyrimidine by another pyrimidine. A transversion is 
15 the replacement of a purine by a pyrimidine or vice versa. Single nucleotide 
polymorphisms can also arise from a deletion of a nucleotide or an insertion of a 
nucleotide relative to a reference allele. 

A "polymorphic profile*' refers to one or more polymorphic forms for 
which a subject is characterized. A polymorphic form is characterized by identifying 
20 which nucleotide(s) is (are) present at a polymorphic site in a nucleic acid sample 
acquired froni a subject. The profile includes at least one polymorpliic form and 
preferably mcludes a plurality of polymorphic forms, such as at least 5, 10, 20, 30, 40, 50, 
60, 70, 80, 90 or 100 polymorphic forms or more. Polymorphic profiles are similar when 
the polymorphic profiles being compared share at least one polymoirphic form at least one 
' 25 polymorphic site. Typically, similar polymorphic profiles share identity of polymorphic 
forms in at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90% or 100% in at least 
10, 20, 30, 40, 50, 60, 70, 100, or 500 polymorphic sites. Polymorphic forms are identical 
if the nucleotide(s) at a particular polymorphic site are the same. Thus, two polymorphic 
profiles each including 10 polymorphic forms are 50% identical if five of the 
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polymorphic forms in the two profiles are identical If the organism is diploid, then the 
polymorphic forms at each polymorphic site are considered to be identical in two 
individuals if both individuals have the same two alleles at the polymorphic site. For 
example, an individual having alleles al and a2 at polymorphic site A is considered to 
5 have the same profile as an individual having alleles al and a2 but not to an individual 
having alleles al and al, or a2 and a2, or al and a3 and so forth. 

The term "linkage" describes the tendency of genes, alleles, loci or genetic 
markers to be inherited together as a result of their location on the same chromosome, and 
can be measured by percent recombination between the two genes,, alleles, loci or genetic 
ID markers, 

"Linkage disequilibrium" or '^allelic association" means the preferential 
association of a particular allele or genetic marker with a specific allele, or genetic marker 
at a nearby chromosomal location more frequently than expected by chance (see, for 
example. Weir, B., Genetic Data Analysis, Sinauer Associate Inc., 1996). For example, if 

15 locus X has alleles a and b, which occur equally fi-equently, and linked locus Y has alleles 
c and d, which occur equally firequently, one would expect the combination ac to occur 
with a firequency of 0.25. If ac occurs more firequently, then alleles a and c are in linkage 
disequilibrium. Linkage disequilibrium may result from natural selection of certain 
combination of alleles or because an allele has been introduced into a population too 

20 recently to have reached equilibrium with linked alleles. 

A marker in linkage disequilibrium can be particularly useful in detecting 
susceptibility to disease (or other phenotype) notwithstanding that the marker does not 
cause the disease. For example, a maiker (X) that is not itself a causative element of a 
disease, but which is in linkage disequilibrium with a gene (including regulatory 
25 sequences) (Y) that is a causative element of a phenotype, can be detected to indicate 
susceptibility to the disease in circumstances in which the grae Y may not have been 
identified or may not be readily detectable. 
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with isotopes, chromophores, lumiphores, chromogens, or indirectly labeled spch as with 
biotin to which a str«^tavidin complex can later bind. By assaymg for the presence or 
absence of the probe, the presence or absence of the select sequence or sub-sequence can 
be detected. 

. "label" is a composition detectable by spectroscopic* photochemical, 

biochemical, immunochemical, or chemical means. For example, useful labels include 
"P. fluorescent dyes, electron-dense reagents, enzymes (e.g.. as commonly used in an 
ELISA), biotin. dioxigenin, or haptens and proteins for which antisera or monoclonal 
antibodies are available (e.g., by incorporatmg a radio-label into the peptide, and used to 
0 detect antibodies specifically reactive with the pepride). A label often generates a 

measurable signal, such as radioactivity, fluorescent light or enzynae activity, which can 
be used to quantitate the amount of bound label. 

A "labeled nucleic acid probe" is a nucleic acid probe that is bound, either 
covalentiy, through a linker, or through ionic, van der Waals or hydrogen bonds to a label 
15 such that the presence of the probe can be detected by detecting the presence of the label 
bound to the probe. 

The phrase "selectively hybridizes to" refers to the binding, duplexing, or 
hybridizingof a molecule only to a particular nucleotide sequence under stringent 
hybridization conditions when that sequence is present in a complex mixture (e.g., total 
20 cellular) DNA or RNA. The phrase "stringent hybridization conditions" refers to 
conditions under which a probe hybridizes to its target subsequence, but to no other 
sequences. Stringent conditions are sequence-dependent and are different in different 
circumstances. Longer sequences hybridize specifically at higher temperatures. An 
extensive guide to the hybridization of nucleic acids is found in Tijssen, Techniques in 
Biochemistry and Molecular Biology-Hybridization with Nucleic Probes, "Overview of 
principles of hybridization and the strategy of nucleic acid assays" (1993). Generally, 
stringent conditions are selected to be about 5-10 °C lower than the thermal melting point 
(Tm) for the specific sequence at a defined ionic strength pH. The Tm is the temperature 
(under defined ionic strength, pH, and nucleic concentration) at which 50% of the probes 
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IL . General 

The present invention provides methods, computer programs and 
computerized systems useful for designing treatment studies and for evaluating the 
efficacy of various types of treatment procedures clinical trials) as a function of the 
genotype of a subject. The methods of the invention are designed to control for underlying 
genetic fectors that may influence the response to a treatment. The present invention is 
based, in part, on the insight that controlling, either dircctty or indirectly, genetic fictors that 
influence a patient's response to treatment can greatly increase the power of the clinical 
trial. Some methods are designed to reduce the genetic diversity of <he patient population so 
as to increase the probability of individuals sharing the same alleles at genes involved in 
response to the treatment. In cases where polymorphisms (usually in genes) are known to 
be associated with or cause differences in response to the treatment, these polymorphisms 
can be used directly in the design of the clinical trial. 

For example, the invention provides methods for reducing the variance in 
the biological condition or phenotype of interest by controlling for genetic factors 
influencing that phenotype. In the context of a clinical trial, the phenotype of interest is 
the response to a treatment. Genetic factors can be controlled in a number of different 
ways but the principle underlying the methods of the inventiott can be illustrated by an 
example. If the test parameter is measured in two groups, the first (which is of size n) is 
treated and the second (of size m) is untreated, the mean and variance of these samples 
can be calculated in the standard way (see Armitage & Berry, Statistical Methods in 
Medical Research, Blackwell Science, 1995.) Thus, for instance, in an example the mean 
and variance of the treated group are /<, and respectively, and the mean and variance 
of the untreated group are /i, an.d , respectively. Then an {q)proximate confidence 
interval at a% (where, for example a = 0.95 ) for the difference in response between the 
two groups is given as. 
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where Z„,2 is the value of the standard normal distribution that is exceeded by chance in 
a/ 2% of cases. 

Hence, any method that decreases the variance in either sample (i.e., which 
decreases if or si ) necessarily decreases the size of the confidence interval 
Alternatively, when the variance of one of both of the samples is decreased, the size of the 
confidence interval can be held constant with fewer patients enrolled in the trial (z.e., n 
and/or m can be reduced). Thus, reducing the variance in response can lead either to 
greater certainty of a difference (here encapsulated by a smaller confidence interval) or in 
a reduced sample size for the same statistical power. The variance can be reduced in a 
number of different ways as described in the following sections. 

A. Selective enrollment of patients. One approach is to control for potentially 
confounding factors by increasing the homogeneity of the population. In the context of 
genetics, a set of polymorphic markers can be examined in a large group of subjects and 
those with similar polymorphic profiles enrolled in the treatment study. Incorporating 
genetic factors (represented by the polymorphic profile) into the inclusion/exclusion criteria 
of a treatment study allows an experimenter to reduce the variance in response due to 
underlying genetic factors, 

B, Division of patient population into geneticallv homogenous subsets. A 
second approach is to categorize individuals mto subsets depending on how sfanilar the 
polymorphic profiles are to one another. Within each subset, subjects are randomly 
allocated into treatment or control subpopulations, as they are in a standard clinical trial for 
example. This method ofdividing the subjects creates subsets that are genetically more 
homogenous than a random sample of the same size. This design is equivalent to 
conductmg several smaU, independent treatment studies each of which contains patients that 
have more similar polymorphic profiles than expected by chance. Many environmental 
variables can be manifestations of underlying genetic factors. By examining genetic 
polymorphisms directly, it is possible not only to reduce variance due to genetic factors that 
are not directly observable, but also to improve the stratification bas ed on envirorunental 
factors that are acting as surrogates for the underlying genetics &ctoTS that control them. As 
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procedure {eg., administration of a drug) being tested is unlikely to be effective in any 
significant portion of the population, and that further research is not justified. If, 
however, statistical significance is reached for a particular polymorphic DNA profile, at 
least two conclusions follow. First, in the case of a clinical trial oa a drug that the drug is 
effective in at least a portion of the population, and fiirther development of the drug may 
well be justified. Second, one knows the portion of the general population in which the 
drug is effective, this portion being defined by a polymorphic profile. This profile can be 
used as a diagnostic to identify patients appropriate for treatment when the decision to 
treat or a choice of treatments is made. 

As an example of a method of the invention, a clinical trial can be carried 

out as follows: 

1. Identification and choice o f polvmorohisms. 
A set of polymorphisms is identified that allow the division of the patient 

cohort into sub-groups. These polymorphisms may be known to be involved in the test 
parameter (e.g., the phenotype or endpoint) that is to be measured or can be chosen at 
random. (In the latter case, the genetic sub-groups may show identical results with 
respect to the phenotype of interest. This implies the method of grouping does not 
decrease the variance in the endpoint and the population can be re-analyzed as a whole. 
Thus, stratification by using genetic data does not have a deleterious effect on the 
experiment or trial, even in cases where it does not influence the outcome). 

2. Genotvpinp of the cohort. 

Some or all of the markers are genotyped in the entire cohort of patients 
enrolled in the clinical trial. These data are then used either as inclusion/exclusion criteria 
(see 3a below) or to divide the cohort into subgroups (see 3b below). 

3a. Inclusion/exclusion of patients using genetic information. 
30 If some or all of the polymorphisms are known to influence the test 
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or toxic response to a treatment, and may identify by virtue of unresponsiveness, a clinical 
subset of patients that define a "different" disease. In short, a post facto genetic analysis 
coirelated with a specific clinical phenotype such as drug responsiveness or 
unresponsiveness can reveal different etiologic mechanisms for the disease being treated. 
5 This is especially likely in the case of ethnic differences among patients where each 
ethnic group has a distinctive response to a treatment Finally, analysis of phenotypic 
markers can provide insight into genetic diversity of the subjects being treated allowing 
the clinician to alter enrollment in a drug trial to accommodate more or less genetic 
diversity as is scientifically prudent. 

la 

m. Methods 

A. General 

In the methods of the invention, members of a treated and control 
(untreated) population having a biological condition of interest (e.g. , a disease) are 
15 characterized for polymorphic profile and a test parameter that is a measure of the 

biological condition, assuming the members have not already been so characterized. The 
members in the treated population have been (or are) treated according to a treatment 
procedure, whereas the members of the control population have been (or are) treated 
according to a control procedure. 

20 To reduce total variance in the treatment assessment or study, 

subpopulations firom the treated and control populations are selected for similar genetic 
composition such that the members in the two populations have similar or identical 
polymorphic profiles. The polymorphic profile of the subpopulations includes one or 
more polymorphic fonns. Typically, the polymoiphic profile includes a plurality of 

25 polymorphic forms, generally at least 5, in other instances at least 10, and in still other 
instances at least 1 00, or any number Hbssct between. 

To minimize genetic variance between the treated and control 
subpopulations, the polymorphic profiles for the two groups are selected to be similar. 
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responsiveness may be more important (and hence given more weight) than random 
polymorphisms. 

The polymorphisms can be in genomic DNA, KNA or cDNA. While any 
polymorphisms can be used, those of particular import are polymoxphisms in genes that 

S encode proteins that directly or indirectly influence a biochemical pathway that is 

correlated with the biological condition being measured or observed. Thus, for example, 
if a study involves assessing the efficacy of methods for treating patients having elevated 
blood cholesterol levels, the polymorphic profile can be tailored to include 
polymorphisms located in genes known to be involved in cholesterol synthesis and 

10 metabolism. 

Once apjpropriate subpopulations have been selected such that the 
subpopulations have the desired level of similarity in polymorphic profile, a 
. determination is made whether there is a correlation between the polymorphic profile and 
the efficacy (or lack thereof) of the treatment method by ascertaining whether there is a 

15 statistically significant difference in a test parameter between the treated and control 
subpopulations, where the test parameter is a measure or is representative of the efficacy 
of the treatment for the biological condition shared by members of the subpopulation. A 
finding of a statistically significant difference, indicates that the polymorphic forms in the 
polymorphic profile of the treated subpopulation correlate with the biological condition 

20 {e,g. , the polymorphic profile is coxielated with a particular disease) and that the treatment 
method under study is useful (or not beneficial) for treating subjects with the biological 
condition. 

As noted above, such correlations are particularly important, for example, 
in clinical trials on a drug. In some instances, the correlation identifies a set of genetic 
25 markers associated with the disease and thus has diagnostic value. In other instances, the 
correlation identifies markers that are associated with a positive treatment result and thus 
are important firom a therapeutic standpoint 

A statistically significant difference in a test parameter between the 
treatment and control subpopulations can be determined using standard methods of 
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Statistical analysis. Methods include, for example, the analysis of variance, logistic 
regression, cluster analysis, non-parametric statistics, contingency table test and other 
standard statistical te^. 

B. PApft^ifinn <>f Method 

The polymorphic profile of tiie subpopulation initially selected, often do 
not correlate with a statistically significant difference in the test parameter that is used to 
measure the efficacy of treatment. In such instances, the method can be repeated with 
different subpopulations created by using an alternative defmition or measure of genetic 
similarity, or by dividing the population mto greater or fewer sub-populations. This 
reflects the fiict that there will rarely be a single unique way to group patients. Indeed, for 
a study with N individuals, it will often be possible to form any number of sub- 
populations from 1 (the entire population) to N (each individual in its own sub- 
population). Repeating the process is often an effective way of detecting which 
polymorphisms within the polymorphic profile are particularly informative with respect 
to the test parameter of interest Once a correlation is identified, additional cycles can be 
repeated using, for example, a subset of the polymorphic forms utilized in an earlier cycle 
to determine whether the subset might show an even greater correspondence with the test 
parameter and thus treatment efficacy. 

Typically the polymorphic forms within a polymorphic profile evolve over 
time to account for a greater proportion of tiie genetic component of the variance. 
However, these polymorphic forms generally do not contiibute equally. Some account 
for more variance than others; markers that do not correlate with differences in the 
treatment and control procedures are discarded from the analysis. The set of markers as a 
collection have value distinct from the individual markers. This collection has enduring 
value for understanding the genetic contribution to a distinct biological condition of 
interest Individual markers can have diagnostic utility, as can the collection. 
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When the methods of the invention are utilized in cl inical trials, typically 
subjects in the two groups are not undergoing any other treatments for their pathological 
condition. In other instances, the study is designed so that subjects in both treated and 
untreated groups are undergoing the same alternative treatment. 



D. TrMt pient and f/Mitml Procedures 

The types of treatment and control procedures vary according to the 
biological condition to which the treatment is directed. As noted above, the biological 
conditions can be any of a number of conditions, such as a pathological condition or 
simply a biological susceptibility, for example. A variety of different procedures can be 
performed when the biological condition is a pathological condition. In many instances, 
the procedures involve administering a pharmaceutical agent, including, for example: 
1) administering a pharmaceutical agent to members of the treated population and giving 
members in the control population a placebo or noflung at all. 2) giving members of the 
treated population one phannaceutical agent (or combination of pharmaceutical agents) 
and a different pharmaceutical agent (or combination of phannaceutical agents) to the 
control members; 3) providing one quantity of a pharmaceutical agent to the treated 
population and a different amount to the control population, or 4) administering a 
pharmaceutical agent to the treatment and control populatioiis according to different 
schedules. 

Instead of administering a phannaceutical agent, the treatment procedure 
can include some type of behavioral therapy. Examples of such therapy include, but are 
not limited to, a particular diet regime (eg. . low fat, low sodium, high protein, or a 
restricted calorie diet), a prescribed exercise regune (e.g.. exercising for a certain time 
period a certain number of times a week, performing low-unpact exercises, exercising to 
reach a target heart rate, therapies that work certain muscle groups), meditation, yoga, and 
stress reduction techniques. Of course, the treatinent procedure can include combinations 
of tiie foregoing procedures as well. Members in control groups may not undergo therapy 
at all or may be treated in opposing fashion (or may already be engaged in contiary 
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behaviors). For example, if the treatment group is placed on a low caloric diet, members 
in the control group can be placed on a high caloric diet or can simply be selected for 
those whose normal diet already is a high caloric diet and thus is not altered. 

The treatment procedure can also be directed towards a biological 
5 susceptibility or resistance rather than a pathological condition. Thus, for example, in the 
case of plants, plants can be treated with various agricultural agents; used to affect plant 
growth or health (e.g., fertilizer or other growth stimulants, herbicides, insecticides, and 
pH altering agents) to assess the effect of such agents on various susceptibilities or 
resistances of plants (e.g., susceptibility to frost or freeze damage and resistance to 
10 herbicides). In like manner, humans or other organisms can also be treated with various 
agents, for example vaccines, to determine the effect of the agents on various 
susceptibilities or resistances. 

E. Vtility 

15 The reduction in variance achieved by the methods of the invention 

enables researchers to selectively optimize treatment studies. For e:xample, as the genetic 
variance decreases, the confidence level of the statistical analysis increases. Thus, with 
the methods of the invention, researchers can more confidently attril>ute differences in 
effects as seen between the treated subjects and the control subjects to the treatment 

20 administered, rather than being consequences of genetic differences between patients. 
Furthermore, differences between control and test groups can be appreciated sooner. This 
allows smaller, less costly studies to be performed that have the same statistical power as 
much larger studies that do not match for the underiying genetics. Alternatively, a study 
in which patients are matched for genetic factors will be able to detect much smaller 

25 difjference in response between treated and untreated individuals than a study of the same 
size that ignores genetics factors. This allows for less costly studies, more rapid 
assessment regarding the feasibility and desirability of additional treatment studies, and 
ultimately, in the case of clinical trials on pharmaceutical compounds for example, allows 
for more rapid marketing of the pharmaceuticals. 
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The methods also enable more efficient treatment studies to be designed. 
For instance, once polymorphisms that correlate with pathological conditions have been 
identified, subjects that have the polymorphisms as well as the biological condition can be 
identified and enrolled in additional studies to analyze the effect that other treatments 
5 have on the biological condition of interest. Because subjects that will not respond to the 
treatment are not enrolled, fewer subjects need to be enrolled. Alternatively, if a set of 
polymorphisms emerges that, when matched between patients in a control and test arms 
of a trial, is highly correlative with the biological condition being situdied, subsequent 
trials of the efficacy of a treatment can be tested with fewer patients regardless of 
10 response rate if the biological condition being measured has a genestic conq)onait In 

addition, when polymorphisms associated with differential response are identified, it may 
be possible to tailor the dose a specific patient recdves to be optunal given their 
polymorphic profile. This will be particularly important when there are unwanted side 
effects of the treatment and it is desirable to give the minimum efficacious dose. 
j5 Furthermore, as noted above, the treatment methods described herein 

permit the identification of subsets of polymorphic forms that correlate with either a 
favorable response or unresponsiveness to treatment, or an unwanted or toxic response to 
a treatment. Clinical trials on the efficacy of certain pharmaceutical treatinents can 
identify individuals that are unresponsive to treatinent and, in so doing, can in some 
20 instances result in the identification of a clinical subset of patients that define a "different" 
disease. Such correlations can also be used as a prognostic and/or diagnostic tool to 
identify subjects having or likely to acquire a disease or to select appropriate tireatinent 
procedures for a subject based upon the particular genetic composition of flie subject. 

Information gained fiom clinical tiials in which patients are genotyped for 
25 a set of polymorphic genetics markers can also be used in otiier stages of drug discovery 
and development. For example, genes shown to be associated with response via tiie 
polymorphic profile of tiie patients may be amenable to intervention and hence represent 
potential drug targets. Furthermore, identification of treatments that show low efficiency 
(i.e., many non-responders) or that have high rates of adverse events can be identified by 
30 examining tiie polymorphism profile of patients in early phase trials. This information 
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for the two subpopulations, the selecting step 102, the determining step 104, and the 
displaying step 106 are repeated using subpopulations that have a polymorphic profile 
that is different from that in earlier cycles. 

Hence, the microprocessor in the computer system of the present invention 
5 is operatively disposed relative to the system memory, the system bus and the 
input/output so as to perform the foregoing functions. For example, the processor 
provides or receives data that comprises designations for each member of the treated and 
control populations, as well as designations for a polymorphic profile and a test parameter 
for each member of the two populations. The microprocessor is alsio operatively disposed 
10 to select a subpopulation from each of the treatmmt and control populations for similarity 
in polymorphic profile, determine whether there is a stiatistically significant difference in 
the test parameter between the subpopulations and display an output of the result 
obtained. 

The computer program of the invention includes code for providing or 
15 receiving data comprising the various designations for the identity of the members of the 
test and control populations, their polymorphic profiles and test parameter results. The 
program also includes code necessary to perform the selecting, determining and 
displaying steps set forth above. 

20 V. Methods for Determining Polvmorphic Profiles 

A. Preparation of Samples 

Polymorphisms are detected in a target nucleic acid from an individual 
being analyzed. For assay of genomic DNA, virtually any biological sample (other than 
pure red blood cells) is suitable. For example, conv^ent tissue samples mclude whole 
25 blood, semen, saliva, tears, urine, fecal material, sweat, buccal, skin and hair. For assay 
of cDNA or mRNA, the tissue sample must be obtained from an organ in which the target 
nucleic acid is expressed. For example, if the target nucleic acid is a cDNA encoding 
cytochrome P4S0, the liver is a suitable source. 
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monomer units are added after amplification to specific nucleotides ox to non-amplified 
nucleic acids prior to separation on the basis of size (e.g., by capillary electrophoresis). 

5. Isozyme Markers 

Other embodiments include identification of iso2yme markers and allele- 
5 specific hybridization. Isozymes are a group of enzymes that catalyze the same reaction 
but vary in physical properties resulting firom differences in amino acid sequence (and 
hence nucleic acid sequence). Some isozymes are multimeric enzymes containing 
slightly different subunits. Other isozymes are either multimeric or monomeric but have 
been cleaved bom the proenzyme at different sites in the amino acid sequence. Nucleic 
10 acid variation of isozymes can be determined by hybridizing primers that flank a variable 
portion of an isozyme nucleic acid sequence to target nucleic acids contained in a sample 
obtained from an organism. The variable region is amplified and sequenced. From the 
sequence, the different isozymes are detemined and linked to phenotypic characteristics. 

6. Amplified Variable Sequences 

15 Amplified variable sequences of the genome and complementary nucleic 

acid probes also can be used as polymorphic markers. The phrase /'amplified variable 
sequences" refers to amplified sequences of the genome that exhibit high nucleic acid 
residue variability between members of the same species. All organisms have variable 
genomic sequences and each organism (with the exception of a clone) has a different set 

20 of variable sequences. The presence of a specific variable sequence can be used to predict 
phenotypic traits. A variable sequence of DNA can be amplified utilizing the 
amplification techniques listed above) by template-dependent extension of primers that 
hybridize to flanking regions of the DNA obtained from a subject. The amplified 
products can then be sequenced. 

25 7. Allele-Snecific Primers and Hvbridization 

An allele-specific primer hybridizes to a site on target DNA overlapping a 
polymorphism and only primes amplification of an allelic form to which the primer 
exhibits perfect complementarity. This primer is used in conjunction with a second 
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primer fliat hybridizes at a distal site. Amplification proceeds from the two primers and 
produces a detectable amplified product that can be characterized for the particular allelic 
form present in a nucleic acid sample. See. e.g., Gibbs, Nucleic Acid Res. 17:2427-2448 
(1989) and WO 93/22456. 

8. Supple-Strand Confotmati nn Polymorphism Analysis 
Alleles of target sequaices can be differentiated using single-strand 
conformation polymorphism analysis, which identifies base differences by alteration in 
electrophoretic migration of single-stranded PCR products, (see. e.g.. Orita, et al. Proc, 
Nat 'l Acad. Sci. USA 86:2766-2770 (1989). Typically, amplified PCR products are 
denatured (e.g., according to known chemical or thermal methods) to form single- 
stranded amplification products that can refold or form secondary structures, depending in 
part upon the base sequence of the product. The different electrophoretic mobilities of 
single-stranded amplification products can be related to base-sequence difference between 
alleles of target sequences. 

9. « > !elf-sustained Sequence R eplication 
Polymorphisms can also be identified by self-sustained sequence 
replication. In this approach, target nucleic acid sequences are amplified (replicated) 
exponentially in vitro under isothermal conditions using three enzymatic activities 
involved in retroviral repUcation: (1) reverae transcriptaise, (2) RNase H, and (3) a DNA- 
dependent RNA polymerase (Guatelli, et al.. Proc Natl. Acad. Sci USA 87:1874 (1990)). 
By mimicking the retroviral strategy of RNA repUcation by means of cDNA 
mtermediates, cDNA and RNA copies of tiie original target are accumulated. 

10. Arhitrarv Fragment Length Polymorphisms TAFLP^ 
Aibitraiy fi»gment lojgth polymorphisms (AFLP) can also be used as 
polymoiphisms (Vos, et al. NucL Acids Res. 23:4407 (1995)). The phrase "aibitrary 
fi-agment length polymorphism" refers to selected restriction fiagments fliat are amplified 
before or after cleavage by a restriction endonuclease. The amplification step permits 
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primer and at least one nucleotide (typically labelled), that is complementary to the base 
occupying the polymorphic site in one allelic form. If that allelic form is present, then the 
primer is extended and becomes labelled. In some methods, biallelic polymorphic sites 
are analyzed by including two differentially labelled dideoxynucleotides respectively 
5 complementary to bases occupying the polymorphic site in first and second allelic forms 
of the target. Analysis of label present in the extended primer indicates whether one or 
both of the allelic forms are present in a target sample. 

C. High Throughput Screening 

10 In some instances, identification of polymorphisms is done by high 

throughput screening. In one embodiment, high throughput screening involves providing 
a library of polymorphic forms of DNA including RFLPs, AFLPs, isozymes, specific 
alleles and variable sequences, including SSR. Such "libraries" are then screened against 
genomic DNA from the subjects in the treatment study. Once the polymorphic alleles of 

15 a subject have been identified, a link between the polymorphic DNA and the treatment 
effect can be determined through statistical associations. 

Such high throughput screening can be performed in. many different 
formats. For example, for those methods involving hybridization reactions, hybridization 
can be performed in a 96-, 324-, or a 1024-well format or in a matrix on a silicon chip. In 
20 a well-based format, a dot blot apparatus is used to deposit samples of fragmented and 
denatured genomic DNA on a nylon or nitrocellulose membrane. After cross-linking the 
nucleic acid to the membrane, either through exposure to ultra-violet light if nylon 
membranes are used or by heat if nitrocellulose is used, the membrane is incubated with a 
labeled hybridization probe. The membranes are washed extensively to remove non- 
25 hybridized probes and the presence of the label on the probe is determined. 

The labels are incorporated into the nucleic acid probes by any of a number 
of methods well known to those of skill in the art In some instances, a label is 
simultaneously incorporated during the amplification procedure in the preparation of the 
nucleic acid probes. Thus, for example, polymerase chain reaction (PGR) with labeled 
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In another high throughput format, multiple capillary tubes are placed in a 
capillary electrophoresis apparatus. Samples are loaded onto the tubes and 
electrophoresis of the samples is run simultaneously. See, for example, Mathies & 
Huang, Nature 359:161 (1992). Because the separation matrix is of low viscosity, after 
5 each run, the capillary tubes can be emptied and reused. 

The following examples are offered to further illustrate specific aspects of 
the present invention and are not to be interpreted so as to limit the scope of the present 
invention. 

EXAMPLE 1 

10 Effect of Genetic Matching on Sample Size and Confidence 

The invention can be illustrated by the example of studying serum 
cholesterol and the effects drugs may have on this biological condition. It has been 
established that up to 80% of the variance in serum cholesterol can be attributed to 
genetics (see, for example, R. A. King, J. 1. Rotter & A. G. Motulsky, The Genetics 
15 Basis of Common Disease, Oxford University Press, 1992). 

The utility of matching patients at genetic loci is that the confidence, sample 
size or discriminating power of a given study can be favorably affected by genetic matching. 
For example, if a study is required to have 80% power to detect a difference of 20mg/dl at 
the 5% level and za =1 .645 (representing the value from the standard normal distribution 
20 which is exceeded in 5% of cases) and zb = 0.842 (the value of the standard normal 

distribution that is exceeded in 80% of cases) the minimum sample si ze may be calculated 
as follows: 

za <- qnorm (al) [.95 @ 1.645] 

zb <- qnorm (be) [.8 @.842] 

25 If the genetic contribution to the variance of a variable x is 80% and the variance is 1600 
(the standard deviation squared for cholesterol) then the sample size is: 

2x(za+zb)^x variance 

(difference)^ 
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is a set of random variables, each describing an individual in the sample, the expectation 
of the mean response in the swnple is given by, 



^ /=1 



where E[xi]^ m a with probability p and E[xi] = juj3 with probability q-l-p. So 
the mean of the distribution of the sample mean is, 

10 

The variance of the mean response in such a case depends on both the 
variances of each of the two distributions (A and B) and on the difference between the 
means of these distributions. This variance can be expressed in terms of the sum of the 
variances of the individuals, 

15 



(3) 



When a random variable Y is defined such that when y = 0 , V[Xi | K » 0] » (tJ and 
when F = 1 , 1 7 = 1] = <t| , the expectation of this variance is then, 

20 

E[V[Xi\Y]] = P<ri+q<Tl. (4) 
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the two genetic backgrounds have the same frequency, then p = q ^0.5 and the 
distribution of the sample mean is characterized as, 



^ 2 ' IN ^' 



In some instances, all individuals from the sub-population are genoityped for k equally 
informative markers. This is sometimes the case when markers are chosen at random (i.e. 
if nothing is known about genes involved in responsiveness). Additional markers will 
usually provide decreasing information (i.e. though the A + 1 th marker increases the 

10 probability of correctly assigning an individual to a sub-population, it provides less 

information than the k th marker); this does not necessarily have to> be the case but often 
is the case. For example, if there is a priori knowledge of the genes involved in response, 
these are typically examined first. Alternatively, if there is no infonnation about the 
underlying genetics of response, then the genetic matching is based on relatedness (i.e., 

15 the overall degree of genetic similarity in the genome) and hence the first few markers 
will be highly informative with diminishing information from each additional markers. 

Consider a simple model SVhere the probability of assigning an individual 
to the correct genetic sub-population when k markers have been genotyped is given by, 



20 P[correct\k] = \{\+-^). (8) 



As k tends to infinity, the probability of correct assignment asymptotes to I. This 
probability can be used in the equations above to detemiine the mixture of the sampled 
population. For A: oo , p^l in equations 2 and 6, so the mean and variance of the 

25 sample mean is given by (ji^ fCr'ji/N) for population A and (//^ , cr| / 7^ for population 
B (using the information to set j9 - 0 and so select non-responders) as expected. 
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are identical /i^ = //^ and /i^ = /^b « *e sample size is infinity). The sample size also 
increases as the variances of the sub-populations increase. 

D. Example of the sample size for matched and unmatched populations. 

In one instance, the two genetic sub-populations have the same response 
characteristics when no treatment is administered (/i^ = /i^ = 0 , cx^ = = 8 ) and the 
mean response to treatment of individuals from group A is described by //^ = 5,<r^ = 8 • 
Response to treatment for individuals from group B is the same as for the placebo (i.e.. 
they are non-responders) with //jj = 0,crjj = 8 . Further, in this example, a 5% 
significance level ( a = 0.05 ) is used and the sample size represents the minimum number 
of individuals needed for 80% power (fi = 0.8) • Table 2 below gives the number of 
markers ( A ), the probability of the selected individual coming from group A (f.e., being 
correctly identified as a responder) { p ), the variance of the sample mean for the treated 
population ( V[x] ) and the sample size reguu-ed in each ann of the trial {N). 



TABLE 2 



k 


p 


m 


N 


0 


0.50 


70 


83 


1 


0.75 


69 


36 


5 


0.92 


66 


24 . 


10 


0.95 


65 


22 


00 


1.00 


64 


20 



In this example, the within population variance is fixed at 64 for both 
genetic subpopulations and in both treated and untreated samples. Table 2 shows that, 
when no markers are genotyped, the variance is 70. This increase in the variance is 
entirely due to the difference in the response due to the underlying genotype. When this 
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(4) 
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frequencies 
Bayes' Theorem, 
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expressed as, 
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EXAMPLE 4 

Effects of the number of markers and their allele frequencies on the power of the 
polymorphic profile to discriminate between distinct groups of patients. 

s In this example, there are two types of markers, those ^th common alleles 

the two alleles are at similar frequency) and those with a rare allele. For the first 
marker-type, one allele has fi«quency = 0.5 in the sub-population A (responders) and 
0.4 in the sub-population B (non-responders). For the second type of markers, one 
allele has frequency P2-0,l in sub-population A and = 0.08 in sub-population B. In 

lO both cases, the rare allele has a 20% lower frequency in sub-population B compared to its 
frequency in sub-population A. A set of markers, K in number, are genotyped in a sample 
of 2000 patients who are known to belong either to sub-population A or sub-population B 
(fit>m a previous clinical trial). For each of these individuals there are k observations 
jCp ^2 , JC3 which take the value 0 if the rare allele is present and 1 otherwise. These 

15 individuals can be classified into sub-populations with y-0 if they come from sub- 
population A and J/ = 1 if they belong to sub-population B. Using suich training data, 2000 
individuals were generated and assigned to one sub-population or the other using a linear 
logistic model (Christensen, Log Linear Models and Logistic Regression, Springer Verlag, 
New York, 1997) of the form, 

20 

log[^^] = yffo + A^, + + ... + A^* 
P(>^«0) 

Other statistical methods (such as described in section those in Example 3) can also be used. 
25 This linear logistic model was chosen to illustrate another method of classification. 

Table 4 gives the probability of assigning an individual to the correct sub- 
population for 2, 5, 10, 20 and SO markers. Values are givm for both types of markers and 
for a mixture of the two. In this example, all markers are assumed to ^e independent of one 
another. If this were not the case, other, more powerfiil, statistical methods can be applied 
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Trees. CRC Press. 1984). 



TABLE 4 



Number of markers (fc) 



Common allele 
p,=:0.5.9,=0.4 



0.58 



Rare allde 



P2 



0.0% 



Equal mix of 
p,=0.5,g.»0.4 

and 

=0.1.9, =0.08 



0.50 



^050 





10 r 


20 n" 


50 r 


100 


o38 


0:62 


066 


0.75 


084" 


033 


oiss 


"056 


0.57 




0.58 


iO.57 


0.62 1 


,0.70 


0.7V 



In these simulations, 
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It is understood that the examples and embodiments described herein are for 
illustratiYe purposes only and that various modifications or changes in light thereof will be 
suggested to persons skilled in the art and are to be included within the spirit and purview of 
this application and scope of the appended claims. All publications, patents, and patent 
applications cited herem are hereby expressly mcorporated by reference m their entu^ty for 
all purposes to the same extent as if each individual publication, patent or patent application 
were specifically and individually indicated to be so incorporated by reference. 
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1 1 ^ • method of claim 1 , wherein the subpopulations are humans, 

2 animals or plants. 

1 16. The method of claim 1 5, wherein the subpopulations are humans. 

1 17. The method of claim 15, wherein the subpopulations are plants. 

1 18. The method of claim U wherein the subpopulations are bacteria. 

1 1 9. The method of claim 1 , wherein the subpopulations of subjects are 

2 selected as having been simiilarly exposed to an environmental factor. 

1 20. The method of claim 1 , wherein the subpopulations of subjects are 

2 selected as having.been differentially exposed to at least one environmental factor. 

1 21. The method of claim 1 , wherein the subpopulations of subjects are 

2 selected as being from the same ethnic group. 

1 22. The method of claim 1, wherein the subpopulation of subjects are 

2 selected for common phenotypic trait. 

1 23. The method of claim 1, wherein the subpopulations from the 

2 treatment and control populations each include at least S members. 

1 24. The method of claim 23, wherein the subpopulations each include 

2 at least 10 members. 

1 25 . The method of claim 24, wherein the subpopulations each include 

2 at least 100 members. 

1 26. The method of claim 1 , wherein the polymorphic profile for each of 

2 the subpopulations is a single polymorphic form. 

1 27. The method of claim 1 , wherein the polymorphic profile for each of 

2 the subpopulations comprises a plurality of polymorphic forms. 
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1 49. A system for assessing a treatment procedure, comprising: 

2 (a) a memory; 

3 (b) a system bus; and 

4 (c) a processor operatively disposed to 

5 (i) provide or receive data comprising 

6 designations for each member of a treated population 

7 having been treated according to a treatment procedure and each member of a control 

8 population treated according to a control procedure; 

9 designations for a polymorphic profil e for each member of 

10 the treated and control populations; and 

11 designations for a test parameter for each member of the 

12 treated and contrdl populations; 

1 3 (ii) select a subpopulation from each of the treated and control 

14 populations that have a similar polymorphic profile; 

1 5 (iii) determine whether there is a statistically significant 

16 difference in the test parameter between the subpopulations; and 

17 (iv) display an output ofthe result firom step (iii). 

1 50. A method of conducting a clinical trial, comprising 

2 (a) determining a polymorphic profile for individuals in a population 

3 having the same disease, wherein the polymorphic profile includes at least one 

4 polymorphic form at a polymorphic site not known to be associated with the disease; 

5 (b) selecting a subpopulation of individuals having a similar 

6 polymorphic profile firom the population; 

7 (c) administrating a treatment regime to a treatment group within the 

8 subpopulation and a control regime to a control group within the subpopulation; 

9 (d) determining a test parameter in patients m the treatment group and 

10 the control group expected to vary in response to an effective treatment regime; and 

11 (e) determining whether the parameter shows a statistically significant 

1 2 difference between the treatment group and the control group. 
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(S4)Tltte: MEIHODS TO REDUCE VARIANCE IN TREATMENT STtJDIES USING GENOTYPING 



(57) Abstract 

The present invention provides methods, computer programs 
and computerized systems useful for evaluating the cffiMcy of various 
types of treatment procedures {e.g.. clinical trials) as a function of the 
genotype of a subject By matching treatment and control groups 
genetically (100) the methods and systems of the invention reduce tiie 
total variance of the study, thereby allowing trials cxaminmg die 
efficacy or effect of treatment procedures to be conducted with fewer 
subjects, widi increased confidence values, and/or with increased 
precision or discriminatory power. Certain methods of the invention 
involve selecting treated and control subpopulations of subjects from 
treated and control populations for similarity in polymorphic profile 
(102) wherein the treated and control populations have been treated 
with a treatment and control procedure, respectively. A determination 
is then made whether there is a statistically significant difference 
(104) in a test parameter between die treated and control 
subpopulations as an assessment of die test procedure (108). 
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